A Combined Theory Data-Driven Approach to Classifying Delinquency Risk in the Future of Families and Child Well-Being Study

About Me


Name: Nicholas Vietto


PhD Candidate at the University of Nebraska - Omaha


Research Interests: Biopsychosocial Criminology, Quantitative Methods, Data Visualization, Open-Science, Open-Source Software


Prior Work on Risk for Delinquecny


Prior studies of risk for delinquency show that risk factors across multiple domains are associated with increased risk for delinquency.


Studies in this area commonly show individual and socio-environmental differences are associated with risk for delinquency

· Individual - cognitive (e.g., IQ) and trait measures

· Socio-environmental – parents, peers, and communities


A smaller body of research has shown that genetic variation is also associated with risk for delinquency

· Genetic variation associated with dopaminergic and serotonergic function

· Often in interaction with environmental risk factors (e.g., childhood adversity)

Why Machine Learning?


Computational methods used in prior work on risk for has methodological limitations including an over reliance on mass-univariate testing (Dwyer et al., 2018).


Analyses with Machine Learning can improve our understanding of risk for delinquency:

Advanced Data Processing: Efficiently handles and analyzes large amounts of data to enhance predictive power.

Uncovering Complex Relationships: Identifies non-linear and higher-order interactions, especially in high-dimensional datasets, providing deeper insights into variable relationships (e.g., high dimensional data like image, audio, etc.).

Enhanced Predictive Accuracy: Continuously refines predictions through iterative learning, improving overall accuracy over time.

Chan and Colleagues (2023)



Using data from the ABCD Study, Chan et al. (2023) applied a Feed-Forward Neural Network to classify conduct disorder (CD) in children, utilizing a multidomain approach.


Their findings revealed that a model incorporating social, psychological, and biological factors outperformed single-domain models in predicting CD, achieving 91.18% accuracy and an AUC of 0.957.

The Current Study


Generalized the approach of Chan et al. (2023) to risk for delinquency.

Using Future of Families and Child Wellbeing Study (FFCWS):

  • Expanded Sociological Domain: Incorporates rich socio-environmental predictors, including census tract variables, labor market and proximity to gun-violence incidents.

  • Incorporating Genetic Data: Specifically, incorporate genes involved in the serotonergic and dopaminergic pathways to examine the role of polymorphic variation.

  • Classifying Delinquency Risk rather than a CD diagnosis.

Future of Families and Child Wellbeing Study (FFCWS)

Machine Learning Model Development


Framework:

Feed-Forward Neural Network using the {tidymodels} framework in R

Data Spending:

2128 observations

60/20/20 (1276/426/426) split for training, validation, and testing

Feature Engineering Steps:

  • Dummy coded categorical predictors
  • Normalization of predictors
  • Dimensionality reduction via Unsupervised PCA

Biopsychosocial Risk Factors


Socio-Environmental Domain

  • Parental Monitoring Scale (Focal Child, Year 15)

  • Neighborhood Collective Efficacy Scale (Focal Child, Year 15)

  • Conflict Tactics Scale (Focal Child, Year 15)

  • Material Hardship Scale (PCG, Year 15)

Psychological Domain

  • BSI 18 Anxiety Scale (Focal Child, Year 15)

  • Center for Epidemiologic Studies Depression Scale (CES-D) (Focal Child, Year 15)

  • Dickman’s Impulsivity Scale (Focal Child, Year 15)

Genetic Domain

  • SLC6A4 Gene (Serotonin Transporter Gene)
    • 5HTTLPR
    • STin2
  • TPH2 Gene (Tryptophan Hydroxylase 2 Gene)
    • rs4570625
    • rs1386494

Sample Descriptive Statistics

Predictor Descriptive Statistics

Outcome Descriptive Statistics


Preliminary Results (Socio-Environmental)

Preliminary Results (Psychological)

Preliminary Results (Genetic)

Preliminary Results (Biopsychosocial)

Preliminary Results

Limitations


  • Genetic Data Constraints: Genetic information is confined to markers from the candidate gene era, potentially limiting genomic coverage.

  • Sample Size: The relatively small sample size may impact the robustness and generalizability typical for machine learning applications.

  • Age of Assessment: Age 15 may be early for assessing delinquency risk, as behaviors predictive of long-term patterns may not yet be fully evident.

Future Directions



  • Enhance Domain Optimization: Add features to maximize the model’s performance in each specific domain (e.g., adding labor markets for distal predictors in the sociological domain).

  • Evaluate Fairness Across Ethnicities: Assess the final model’s performance across different ethnic groups to ensure fairness, verifying it does not exhibit biases against social or minority groups.

  • Test model on Year 22 data: Validate the model’s performance on the Year 22 data to assess its generalizability and predictive power.

  • Treat delinquency as continuous rather than a using a categorical classification model.

Q u e s t i o n s ?

Supplemental Materials

Network Structure (12-4-1, 57 weights)

Supplemental Materials

Confusion Matrix (Biopsychosocial)

Supplemental Materials

Model Calibration

Supplemental Materials

The Tale of Two Cultures

Data Modeling Culture


Primary Focus: Deriving causal inference

Approach: Emphasizes deductive reasoning

Process: Models the data-generating process to clarify relationships between X and Y

Culture: Grounded in methodologies developed primarily by statisticians

Algorithm Modeling Culture


Primary Goal: Maximizing predictive accuracy

Approach: Emphasizes inductive reasoning, with a focus on learning patterns directly from data

Process: Utilizes black-box models to capture relationships between X and Y

Culture: Rooted in methodologies developed primarily by computer scientists

Supplemental Materials

Machine Learning Workflow using the {tidymodels} framework in R


Supplemental Materials

Neural Networks


Supplemental Materials